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1.  INTRODUCTION 


Consider  the  standard  model  for  multiple  linear  regression 

Y - XB  + e,  (1.1) 

where  Y is  an  n*l  vector  of  observations  on  the  dependent  var- 
iable, 

X is  an  nxp  non-stochastic  matrix  of  rank  p, 

Ms  a pxl  unknown  vector  of  regression  coefficients, 
and  £ is  an  nxl  random  vector  of  errors  with  E(£)  = £ and 

E(ee')  = o2I. 

The  ordinary  least  squares  (OLS)  estimate  of  £ is  given  by 

8 = (X’X)“1X'Y.  (1.2) 

It  is  well-known  that  in  the  presence  of  multicollinearity 
condition  among  the  independent  variables,  the  OLS  estimate 

Ak 

is  often  very  unstable  and  has,  among  its  undesirable  pro- 
perties, a large  value  for  its  mean  squared  error  (MSE) . 

Hoerl  and  Kennard  (1970  a,  b)  introduced  a class  of  es- 
timators, termed  ridge  estimators,  defined  by 

8.(k)  « (X'X  + kl)  ”1X' Y,  k > 0.  (1.3) 

Originally,  (1.3)  was  used  in  conjunction  with  a "ridge  trace" 
to  find  a constant  k which  would  stabilize  the  components  of 

A 

~the  vector  £(k) . In  so  doing,  it  was  found  that  the  stabilized 
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£(k)  often  resulted  in  a reduction  of  MSE  as  well,  as  compared 

a 

to  the  OLS  £. 

Subsequent  literature  on  ridge  estimation  focussed  atten- 
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tion  on  the  MSE  properties  of  ]3(k)  , where  k's  (which  are  func- 
tions of  the  data  X and  Y)  have  been  proposed  and  studied. 

For  a review  of  the  ridge  regression  literature  and  some  re- 
cent simulation  results,  see,  e.g.,  (Hocking  1976,  Dempster, 
Schatzoff,  and  Wermuth  1977,  and  Gunst  and  Mason  1977). 

In  spite  of  the  fact  that  the  large  amount  of  work  on 
ridge  regression  deals  almost  exclusively  with  the  MSE  pro- 
perty or  performance  of  various  ridge  estimators,  MSE  consid- 
erations should  remain  secondary  in  importance  in  the  ridge 
regression  context  although  the  MSE  criterion  is  an  important 
estimation  criterion  in  its  own  right.  The  two  main  reasons 
for  this  perspective  are:  (1)  The  primary  intent  of  a ridge 
analysis  is  to  remove  some  undesirable  effects  of  multicolli- 
nearity  - this  can  be  accomplished  either  by  keeping  all  of 
the  original  independent  variables  and  attempt  to  minimize 
the  MSE  using  (1.3)  as  the  form  of  the  point  estimate,  or  by 
removing  one  or  more  independent  variables  which  cause  the 
multicollinearity  condition,  by  using  ridge  regression  as  a 
tool  for  detecting  such  variables.  (2)  if  MSE  were  the  pri- 
mary concern  of  the  analysis,  there  would  be  considerably  less 
incentive  to  restrict  one's  attention  to  the  class  of  ridge 
estimators,  in  which  very  little  theoretical  results  are  known; 
whereas  several  other  classes  of  estimators  (relatives  of  the 
James-Stein  (1961)  estimator)  have  been  proven  to  dominate  the 
OLS  estimate  in  MSE. 
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For  the  above  reasons,  we  carried  out  a simulation  ex- 
periment (involving  several  ridge  estimators)  which  was  dif- 
ferent from  previous  studies  by  others  in  several  respects. 
First,  we  did  not  attempt  to  do  a comparative  study  in- 
volving a large  number  of  estimators  as  was  done  by  Dempster, 
Schatzoff  and  Wermuth  (1977),  nor  did  we  restrict  our  attention 
to  ridge  estimators  alone.  Instead,  we  selected  several  ridge 
estimators  that  have  previously  been  reported  to  have  good 
MSE  properties,  as  well  as  several  estimators  not  in  the  ridge- 
class  that  seem  to  hold  promise  for  theoretical  reasons.  Se- 
cond, we  controlled  the  relevant  factors  and  parameters  of  the 
problem  over  a more  comprehensive  region  than  those  considered 
in  previously  reported  studies.  Third,  we  studied  in  de- 
tail the  empirical  sampling  distributions  of  the  squared  loss 
incurred  by  each  method  over  various  combinations  of  a factor- 
ial design.  Fourth,  several  new  quantities  of  calibration 
were  used,  in  addition  to  the  empirical  MSE,  in  assessing  and 
comparing  the  performance  of  various  estimators. 

Within  the  scope  of  the  present  study,  we  found,  with  rare 
exceptions  (in  near  orthogonal  cases  of  the  design  matrix) , 
all  of  the  estimators  to  yield  better  empirical  MSE  than  that 
of  the  OLS  estimates,  and  often  by  substantial  margins  espe- 
cially under  conditions  of  high  multicollinearity  among  the 
independent  variables.  On  a relative  basis,  several  different 
estimators  excel  in  different  regions  of  the  control- factor 
space  but  none  was  found  to  be  the  best  in  all  the  regions. 
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On  the  other  hand,  two  of  the  best  estimators  for  highly  multi- 
collinear  data  were  among  the  worst  when  the  design  matrix  X 
was  nearly  orthogonal.  Several  of  the  estimators  have  been 
proven  theoretically  to  dominate  the  OLS  8 in  MSE.  However, 
none  of  the  existing  theoretical  results  enables  one  to  assess 
the  magnitudes  of  improvement  over  our  control- factor  space. 

We  found  in  many  instances  the  estimators  whose  theoretical 
MSE's  are  unknown  performed  considerably  better  (in  empirical 
MSE)  than  those  in  which  theoretical  MSE  results  exist. 
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2.  ESTIMATORS  CONSIDERED  IN  THIS  STUDY 

Several  biased  estimators  were  considered  in  the  study 
along  with  the  OLS  estimator.  The  selection  process  was 
necessarily  subjective.  One  of  the  factors  influencing  our 
choice  was  our  a priori  judgement  of  the  potential  of  the  es- 
timators guided  by  studies  reported  in  the  most  recent  litera- 
ture. Moreover,  the  fact  that  some  simulation  studies  had 
been  done  on  certain  individual  estimators  motivated  us  to 
analyze  the  relative  performance  of  these  estimators  when  stu- 
studied  collectively.  This,  we  felt,  would  provide  some  in- 
sight with  respect  to  certain  measures  of  performance  of  the 
dominance  of  certain  estimators  over  others  in  some  ranges  of 
the  factor  space,  which  would  enable  us  to  develop  a guide- 
line for  choosing  estimators  for  a particular  problem. 

The  estimators  that  we  chose  may  be  grouped  into  two  cate- 
gories: single-parameter  or  multi-parameter  families.  Within 

these  groups  we  have  considered  some  ridge  estimators  and 
some  other  estimators  that  were  originally  proposed  in  the 
context  of  estimation  of  the  mean  of  a multivariate  normal 
distribution.  James  and  Stein  (1961)  estimator  and  the  Baran- 
chik  (1970)  class  of  estimators  are  examples  of  the  latter  type. 
The  equivalence  of  ridge-type  estimators  and  Baranchik-type 
estimators  have  been  shown  for  the  orthogonal  case  (X'X  - I) 
and  may  be  found  in  (Mitra  1977) . Two  new  classes  of  ridge 
estimators  were  derived  from  the  Baranchik  class  of  estimators. 
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The  Baranchik  class  of  estimators,  for  the  estimation  of 
the  mean  of  a multivariate  normal  distribution  is  of  the  form 


6i 


( 1 ~ (s)  ) X . , 

s 1 


(2.1) 


where  s » x’x,  x * (x^»  X2#  . and  t(s)  is  a function  of 

s.  For  each  of  p parameters  0^,  9_,  ...0  we  observe  an  inde- 
pendent  normal  variate  Xi-N(ei,  1).  Baranchik  (1970)  found 
sufficient  conditions  for  x (s)  to  guarantee  the  estimator  to 
have  smaller  risk  then  the  usual  estimator  x.  Efron  and  Morris 
(1976)  found  necessary  and  sufficient  conditions  for  estimators 
of  the  form  (2.1)  to  dominate  x in  risk.  They  considered  the 
case  when  x^~N(e^,  a ) where  ^ may  be  unknown.  In  their  for- 
mulation, s in  (2.1)  is  replaced  by  F,  where  F * . 

(3  (n-p) 

The  two  new  classes  of  estimators,  labeled  as  Mitra  1(M1)  and 
Mitra  2 (M2)  are  special  cases  of  the  above  form,  and  will 
therefore  dominate  the  OLS  estimator  in  MSE  in  the  orthogonal 
case.  For  the  Mitra  1 estimator  (1977),  the  ridge  parameter  k in 
(1.3)  takes  on  the  form 


<p-2)<t  'sp?-’ 


r - <p-2) (t  - 


t where  F 


1 £(n-p+2)  , 
(n-p) o2 


~ ~ 2 

where  & is  the  OLS  estimator  and  0 is  an  unbiased  estimator 
of  o2. 

For  the  simulation,  the  values  of  the  parameters  were 
chosen  to  be  t*l,  Cj*l,  and  c2*2  yielding 
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k - <P 'll (2.2) 

F (F+2) / (F+l)  - (p-2) 

For  the  Mitra  2 estimator  the  ridge  parameter  is  of  the  form 

-cF 

(p-2) [t  - c.e  * ] 

k = _c  F . 

F - (p-2) [t  - Cje  2 ] 

For  the  simulation,  the  parameter  values  were  chosen  to  be 
t»l,  Cj»l,  c2*l,  yielding 

k = (p-2)  (1~e~  i _ . (2.3) 

F - (p-2) (1-e  ) 

A multi-parameter  or  generalized  ridge  estimator  takes  the  form 

8 (K)  = (X'X  + KI)“1X,Y,  (2.4) 

where  K - diagfk^,  k2»  •••»  kp) . 

Table  1 gives  a listing  of  the  estimators  that  were  used  in 
the  simulation,  along  with  their  references.  The  reader  is  re- 
ferred to  these  references  for  explicit  expressions  of  these 
estimators,  which  are  omitted  from  this  paper. 

HK  and  GM  are  special  cases  of  (2.4)  while  B,  B+,  V,  and 
V+  are  of  the  form  (2.4)  but  K is  not  necessarily  diagonal. 
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3.  SIMULATION  DESIGN 


Many  simulation  studies  on  the  performance  of  ridge  and 
other  biased  estimators  have  appeared  in  the  recent  literature. 
Among  these  are  the  studies  of  Gunst  and  Mason  (1977)  , Demp- 
ster, Schatzoff  and  Wermuth  (1977)  , Lawless  and  Wang  (1976)  , 
Hoerl,  Kennard  and  Baldwin  (1975),  McDonald  and  Galarneau  (1975), 
to  cite  but  a few.  Our  study  differs  from  these  and  other  si- 
mulation studies  mainly  in  two  respects.  First,  we  considered 
the  relevant  parameters  of  the  problem  (dimensionality,  degree 
of  multicollinearity , etc.)  over  a more  comprehensive  range 
and  combinations  than  previous  studies.  Second,  we  report  the 
empirical  performance  of  various  estimators  in  greater  detail, 
i.e.,  using  several  measures  of  performance  in  addition  to  the 
usually  reported  average  squared- loss  (empirical  MSE) . 

3.1  Generation  of  X and  j>. 

In  the  ridge  regression  studies  using  simulations  that  have 
appeared  to  date,  there  does  not  appear  to  be  any  standard 
method  of  simulating  data  and  parameters  from  the  linear  model 
Y = XJ3  + In  principle,  the  performance  of  ridge  estimators 
depends  on  the  design  matrix  X only  through  the  eigenvalues  of 
the  matrix  X'X,  as  was  pointed  out  by  Efron  and  Morris  (1977) 
p.  92)  , and  that  X,  t_,  or  Y need  not  be  actually  generated, 
as  was  the  case  in  the  study  of  Dempster,  Schatzoff  and  Wermuth 
(1977).  In  some  studies,  e.g.,  (Hoerl,  Kennard  and  Baldwin 
1975)  particular  sets  of  real  data  from  other  published  sources 
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were  used  for  X.  We  chose  to  follow  the  procedure  used  and 
described  by  McDonald  and  Galarneau  (1975,  p.  409)  to  generate 
the  X's  and  jj*  s in  this  study  because  the  procedure  provides 
a reasonable  method  of  choosing  an  X (and  the  most  and  least 
favorable  £'s)  in  any  given  dimension  with  a specified  multi- 
col  linearity  structure.  Basically,  for  each  given  dimension  p 
and  a correlation  coefficient  p,  the  elements  of  X ( ^ , 
i=l,  ...»  100;  j*l,  ...»  p.)  are  assumed  to  have  the  intraclass 
correlation  structure  (theoretically) : 


(3.1) 


The  sample  size  of  100  was  used  (as  was  in  McDonald  and  Galar- 
neau 1975)  so  that  the  sample  correlation  matrix  of  X closely 
resembles  the  form  (3.1).  For  each  X,  two  sets  of  £ are  gen- 
erated, of  unit  length,  corresponding  in  some  sense  to  the 
"most  favorable"  (where  j3  is  the  normalized  eigenvector  corre- 
sponding to  the  largest  eigenvalue  of  the  X'X  matrix)  and  the 
"least  favorable"  (normalized  eigenvector  corresponding  to  the 
smallest  eigenvalue  of  X'X)  choices  of  £ for  that  X.  See 
(McDonald  and  Galarneau  1975,  p.  409). 


3.2  Choice  of  Parameters  and  Method  of  Replication. 

The  dimensionality  of  X (the  number  of  independent  variables) , 
denoted  by  p,  were  taken  to  be  3,  6,  and  10,  corresponding  to 
what  we  considered  to  be  low,  moderate,  and  high  dimensions. 
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For  each  p,  six  values  of  P were  chosen  (as  a function  of  the 
multicollinearity  index  a described  in  the  following  subsec- 
tion 3.3)  ranging  from  p = 0 to  o > 0.99.  For  each  combina- 
tion of  p and  p,  an  X matrix  and  two  coefficient  vectors  3. 
were  generated  as  described  in  the  previous  subsection.  For 
each  of  these  combinations  of  (p,  P,  X,  g)  , random  error  vec- 
tors £ (and  hence  Y)  were  replicated  100  times  from  each  of  5 

2 2 

normal  distributions  N(_0,  a I).  The  values  of  o considered 

in  this  study  were  0.001,  0.01,  0.1,  0.2,  and  1.0,  corresponding 

2 

to  the  relative  magnitudes  of  a to  8 (which  has  unit  length) . 

3.3  Index  of  Multicollinearity  a. 

Given  the  correlation  structure  (3.1)  of  X,  McDonald  and 
Galameau  (1975)  used  P as  their  measure  of  multicollinearity 
in  X.  We  introduced  an  index  u,  which  is  a function  of  p and 
p as  our  measure  of  multicollinearity,  derived  from  the  following 
heuristics : 

In  general,  a reasonable  measure  of  multicollinearity  is 

P 

<5  = [ 1/1. 

i-1  1 

where  i-1,  p are  the  eigenvalues  of  X'X.  For  X'X  of 

the  form  (3.1) , 

6 = (p-l)/(l-Q)  + 1/(1  + (p-1)  p)  , (3.2) 

so  that  p < 6 < «•  In  particular,  if  p * 0,  then  6 = p.  We 
therefore  introduced  a "normalized"  measure 


a = <Vp 


(3.  3) 
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so  that  a - 1 when  P » 0 for  all  p,  and  1 <_  a < ®.  It  can 
be  easily  seen  from  (3.3)  and  (3.2)  that  given  p and  a,  the 
corresponding  p > 0 is 

1 

P-  ( (p-2)  (a-l)  + ((p-2)2(a-l)2+4a(a-l)  (p-1)  )7J  / ( 2a  (p-1) ).  (3.4) 

In  this  study,  we  chose  the  values  of  a to  be  1,  2,  5,  10, 

50,  and  100,  which  correspond  to  the  values  of  p shown  in 
Table  2. 
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4.  SIMULATION  RESULTS 

Several  performance  measures  were  used  to  study  the  rela- 
tive performance  of  the  fourteen  estimators  (including  OLS) . 

4 . 1 Comparisons  to  optimal  values . 

Given  £ (which  is  the  unknown  coefficient  vector  to  be  es- 
timated, but  known  in  a simulation  study)  and  a realization  of 
the  error  vector  £,  values  of  k can  be  determined  such  that  the 
squared  loss 

L (k)  - ( 8 l k ) - 8 ) ' (8(k)-8)  , 

where  8 (k)  is  of  the  form  (1.3),  is  minimized  over  various  inter- 
vals for  k.  These  minimizing  values  of  k can  be  said  to  yield 
optimal  single-parameter  ridge  estimates  of  £.  Such  optimal  es- 
timates are  of  course  not  realizable  in  practice  because  they 
depend  on  the  particular  realizations  of  £ as  well  as  the  value 
of  true  £.  However,  they  are  realizable  in  a simulation  study  and 
serve  as  useful  quantities  of  calibration  because  they  yield  ab- 
solute lower  bounds  for  L(k)  over  the  entire  class  of  single- 
parameter ridge  estimators,  irrespective  of  how  k is  determined 
from  the  data. 

McDonald  and  Galarneau  (1975)  considered  such  optimal  ridge 
estimates,  for  k restricted  to  the  unit  interval  [0,  1).  The 
restriction  k _>  0 was  natural  because  the  restriction  is  generally 
included  in  the  definition  of  a ridge  estimator.  The  restriction 
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k ^ 1 was  quite  arbitrary,  however.  We  considered  two  optimal 

A * 

ridge  estimates:  (1)  The  (global)  optimal,  denoted  by  8(k  ), 

where  k*  minimizes  L(k)  for  -<*>  < k < and  (2)  the  positive- 

A * * 

part  optimal,  denoted  by  8(k+),  where  k+  minimizes  L(k)  for 
0 £ k.  Dempster,  Schatzoff,  and  Wermuth  (1977)  also  used  the 
notion  of  an  optimal  ridge  estimate.  However,  the  value  of  k 
for  their  optimal  is  determined  from  an  expression  that  minimizes 
the  MSE  (using  estimates  of  the  OLS)  instead  of  minimizing  the 
squared  loss  for  each  realization  of  e_. 

Three  indices,  L(k)/L(k  )-l,  L(k)/L(k*)-1  and  1 - L(k)/L(0), 
were  used  to  compare  the  performance  of  an  estimator  with  that 
of  the  optimal  ridge,  positive  part  optimal  ridge  and  the  OLS 
estimator  respectively  for  each  sample.  A sample  of  such  results 
is  shown  in  Table  3.  From  this  table  we  find  that  for  this 
sample,  methods  M,  HKB,  JS+,  B+,  and  B do  worse  than  the  OLS  in 
squared  loss.  As  far  as  performance  with  respect  to  the  optimal 
ridge  estimate  is  concerned,  we  note  from  the  same  table  that  M 
was  2.374  times  worse  than  it.  On  the  other  hand,  HK  did  better 
than  the  single  parameter  optimal  ridge  estimator,  which  is 

possible  since  HK  is  a multi-parameter  ridge  estimator.  For 

* * * 
this  sample,  k > 0,  so  that  k ■ k+. 

These  optimal  ridge  estimates  serve  a useful  purpose  in  MSE 

comparisons  also.  Along  with  the  empirical  MSE  of  the  different 

estimators,  found  over  the  replications,  the  empirical  MSE  of 

the  optimal  ridge  estimates  are  found  in  the  simulation.  One 

such  result  is  shown  in  Table  4.  Such  comparisons  will  give  us 
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a measure  of  how  much  of  a reduction  in  MSE  is  possible,  even 

though  not  attained.  In  particular,  from  Table  4,  for  a = 1, 

* we  find  HKB  to  be  the  best  with  a MSE  of  1.20.  The  OLS  has  an 

empirical  MSE  of  2.89.  The  OLS  (Expected)  row  represents  the 

2 P 

theoretical  MSE  for  the  OLS  as  found  from  o L’  1/1.  . Hence, 

i=l  1 

the  best  estimator  in  this  case  (HKB)  has  MSE  a little  less  than 
one  half  (0.415)  of  that  of  the  OLS.  On  the  other  hand,  in  com- 
parison to  the  lower  bound  of  the  MSE,  the  MSE  of  HKB  is  2.4 
times  larger  than  the  MSE  of  the  optimal  ridge  estimator. 

From  Table  4,  we  may  also  note  the  relation  between  the 
MSE  of  the  optimal  and  the  MSE  of  the  positive  part  optimal  ridge 
estimator.  As  the  level  of  multicollinearity  increases,  even 
though  the  same  trend  in  the  MSE's  is  observed,  the  ratio  of 
the  MSE  of  the  positive  part  optimal  to  the  MSE  of  the  optimal 
ridge  estimator  is  found  to  increase  from  1.1  to  17.0.  This 
implies  that  much  reduction  in  L(k)  is  potentially  achievable 
by  not  restricting  k to  be  nonnegative,  at  high  levels  of  multi- 
collinearity. However,  when  B_  is  least  favorable,  it  was  found 
that  there  was  not  a large  difference  in  the  MSE's  of  the  opti- 
mal and  positive  part  optimal  ridge  estimates.  Other  similar 
tables  of  results  may  be  found  in  (Mitra  1977) . 

4.2  Frequency  Comparisons. 

The  empirical  MSE  is  a useful  measure  of  performance.  How- 
ever, other  measures  may  be  more  informative,  especially  if  the 

A 

distribution  of  L(k)  is  highly  skewed.  The  sampling  distributions 

/v 

and  certain  fractiles  of  L(k)  of  the  various  estimators  were 
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studied  in  Mitra  (1977)  where  we  found  some  cases  in  which  L(k) 
is  smaller  than  its  OLS  counterpart  L(0)  a large  percent  of  the 

A A 

time  while  the  MSE  of  8(0)  is  smaller  than  that  of  8(k) . We 
present  a squared  loss  comparison  to  the  OLS  estimates  in 
Table  5,  which  shows  the  approximate  percentage  of  times  esti- 
mators have  smaller  squared  loss  than  that  of  OLS.  McDonald 
and  Galarneau  (1975)  carried  out  a similar  analysis.  However, 
their  study  did  not  consider  all  of  the  estimators  that  we  have 
simulated.  Moreover,  the  effect  of  the  degree  of  multicolli- 
nearity  and  £ can  be  observed  from  our  results.  From  Table  5 
we  find  that  all  of  the  estimators,  with  the  exception  MG, 
outperform  the  OLS  estimator  a large  fraction  of  the  time.  For 
high  multicollinearity , except  for  MG,  the  ridge  estimators  have 
less  loss  almost  100%  of  the  time.  The  effect  of  a can  be  seen 
as  we  notice  that  for  low  multicollinearity,  the  estimators 
have  a smaller  loss  than  that  of  the  OLS  a smaller  proportion  of 
times  than  when  a is  high.  The  effect  of  £ can  also  be  observed 
from  Table  5,  and  satisfies  our  intuitive  conclusions.  With 
the  exception  of  B and  B+,  all  the  estimators  perform  better 
when  8 is  most  favorable  than  when  .8  is  least  favorable. 

4.3  MSE  Comparisons 

One  of  the  common  measures  of  performance  is  the  mean  square 
error  given  by 

MSE  * E [L (k) ) - E[(8(k)-8)' (8(k)-8)l.  (4.1) 
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For  each  method  and  each  combination  of  the  choice  of  parameters 
2 

p,  o , a,  and  6,  the  empirical  MSE  is  computed  over  one  hundred 
replications. 

Following  Gunst  and  Mason  (1977),  an  analysis  of  variance 

procedure  was  adopted  to  determine  the  effects  of  the  controlled 

parameters.  The  logarithm  of  the  empirical  mean  square  error  was 

2 

considered  in  the  analysis.  The  main  effects  of  a,  p,  and  o 
are  highly  significant  for  all  the  methods.  The  effect  of  6.  is 
also  quite  significant  except  for  the  OLS  and  V+  estimators. 

Also,  the  two-way  interactions  have  significant  effects  in  most 
of  the  estimators  considered. 

A sample  table  of  results  showing  the  empirical  MSE  of  the 
different  estimators  including  the  OLS  and  the  optimal  ridge 
and  positive  part  optimal  ridge  estimator  may  be  found  in  Table 
4.  Other  similar  tables  showing  MSE  results  for  the  different 
parameter  combinations  are  omitted  from  this  paper  and  can  be 
found  in  (Mitra  1977) . 

The  effect  of  a,  a measure  of  the  degree  of  multicollinearity , 
may  be  noticed  from  Table  4.  As  a increases,  the  MSE  of  the 
various  estimators  tends  to  increase,  which  seems  intuitively 
expected.  The  relative  performance  of  some  of  the  estimators  is 
very  much  affected  by  a too,  as  can  be  seen  from  Tables  7 and  8. 

The  two  most  noticeable  ones  are  GM  and  M which  are  among 
the  "worst  MSE"  when  a is  small,  but  dramatically  among 
the  "best  MSE"  when  a is  large. 
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2 

The  effect  of  o , the  variance  of  the  error  term,  also 

2 

behaves  as  expected.  As  o becomes  small,  the  MSE  of  the  var- 

2 

ious  estimators  decreases.  For  very  small  values  of  a , say 
2 

a * 0.001,  there  is  not  much  difference  in  the  actual  magni- 
tudes of  the  MSE  of  the  different  estimators.  This  suggests 

2 

that  for  small  o , one  may  use  the  OLS  estimator  and  have  MSE 
comparable  to  the  ridge  or  other  biased  estimators. 

The  unknown  coefficient  vector  J3  has  an  effect  on  the  MSE 
of  the  optimal  ridge  estimates.  When  _B  is  most  favorable,  then 
as  a increases  the  MSE  of  the  optimal  ridge  estimate  tends  to 
decrease.  Hence  the  ratio  of  the  MSE  of  any  chosen  estimator 
to  the  MSE  of  the  optimal  ridge  estimator,  increases  as  a in- 
creases, when  £ is  most  favorable.  On  the  other  hand,  when  £ 
is  least  favorable,  as  a increases  the  MSE  of  the  optimal  ridge 
estimator  increases,  just  as  that  of  the  other  estimators.  The 
effect  of  £ with  respect  to  the  percentage  of  times  an  estimator 
does  better  than  OLS  in  squared  loss  was  observed  in  Section  4.2. 

The  effect  of  p,  the  number  of  independent  variables,  does 
not  exhibit  any  general  pattern  of  influence  on  the  behavior  of 
all  the  estimators.  However,  the  behavior  of  some  of  the  spe- 
cific estimators  seems  to  depend  on  p,  e.g.,  in  terms  of  MSE,  B 
and  B+  do  well  when  p is  large,  and  these  estimators  change  from 
the  "worst  MSE"  class  to  the  "best  MSE"  class  as  p changes  from 
small  to  large.  The  reverse  holds  for  the  estimator  F. 


i 


18 

4 . 4 Empirical  Rank  Minimax  Analysis 

Frequently  one  is  interested  in  obtaining  an  estimator  that 
is  minimax,  with  respect  to  some  measure  of  performance.  A mini- 
max  estimator  would  ensure  the  user  that  the  worst  performance 
of  it  will  be  better  than  the  worst  performance  of  the  other 
estimators.  Using  the  empirical  MSE  as  measure  of  performance, 
we  found  the  maximum  rank  of  the  MSE's  of  the  various  estimators, 
over  the  choice  of  parameters  of:  p*3,  6,  10;  a»l,  2,  5,  10, 

50,  100;  o2=1.0,  0.1;  and  j3.  Table  6 shows  the  maximum  rank 
of  the  MSE's  of  the  different  estimators.  We  observe  that  from 
a minimax  point  of  view,  M2  and  Ml  as  well  as  HKB  and  HK  esti- 
mators seem  most  favorable,  and  the  latter  seems  consistent  with 
one  of  the  findings  of  Dempster,  Schatzoff  and  Wermuth  (1977). 

4. 5 Tables  of  Guideline 

Using  the  simulation  results,  we  now  devise,  as  a rule  of 
thumb,  a procedure  to  choose  estimators  that  may  be  preferred 
as  well  as  those  that  may  be  avoided,  under  the  various  combina- 
tions of  the  parameter  conditions.  For  this  purpose,  we  divide 

the  level  of  multicollinearity  into  two  parts,  small  a for  a < 3 

2 

and  large  a for  a _>  3.  The  variance  of  the  error  term,  a , is 

2 

partitioned  into  two  levels,  corresponding  to  regions,  o <0.1 

2 

and  a >0.1  respectively.  Similarly  the  number  of  independent 
variables,  p is  divided  into  two  parts.  Under  this  structure, 
the  MSE  and  the  ranks  associated  with  them  for  the  various  esti- 
mators were  used  to  form  the  guideline  tables.  The  average  ranks 
of  the  MSE's  of  each  of  the  fourteen  simulated  estimators  (which 
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includes  the  OLS) , under  the  partitioned  structure  so  described, 
were  found  and  used  as  a basis  for  choosing  estimators  that  are 
in  the  "best  MSE"  class  and  those  that  are  in  the  "worst  MSE" 
class.  Table  7 shows  the  guideline  for  single-parameter  esti- 
mators while  Table  8 depicts  the  guideline  for  multi-parameter 
estimators.  The  OLS  estimator  is  not  included  in  the  table, 
since  for  our  range  of  controlled  parameters  the  OLS  estimate 
would  always  fall  within  the  "worst  MSE"  class. 

Some  general  observations  may  be  drawn  from  these  guideline 
tables.  Irrespective  of  the  other  parameters,  for  small  a,  GM 
and  M are  in  the  "worst  MSE"  class,  while  for  large  a they  are 
in  the  "best  MSE"  class.  B and  B+  are  among  the  best  when  p is 
large  while  M2,  Ml,  HKB,  and  JS+  are  among  the  best  when  a is 
small.  In  view  of  the  marked  dependence  of  the  qualities  of  the 
various  estimators  on  the  parameter  space  of  the  problem,  we 
believe  Tables  7 and  8 will  aid  the  user  of  ridge  estimation 
methods  in  choosing  a favorable  method  for  his  particular  problem. 


5.  CONCLUDING  REMARKS 


In  this  study  we  considered  the  performance  of  some  ridge 

and  other  biased  estimators  under  a wide  range  and  combination 

of  controlled  parameter  values.  For  a given  regression  problem, 

2 

a and  p are  known  and  o can  be  estimated,  so  that  one  may  use 
the  tables  of  guideline  to  select  an  estimation  method  for  jj. 

In  view  of  the  extensive  nature  of  this  study,  we  feel 
that  our  results,  bolstered  by  supporting  conclusions  from  other 
published  simulation  studies,  enable  us  to  recommend  with  con- 
fidence a method  of  choosing  among  a large  class  of  estimators 
a small  number  of  promising  candidates,  on  the  basis  of  certain 
characteristics  of  particular  problems. 

For  each  set  of  simulated  data,  the  squared  loss  of  the 
B_  estimate  of  each  estimation  method  was  compared  to  two  common 

A 

calibrations,  the  squared  loss  of  the  OLS  jj  and  that  of  the  op- 
~ * 

timal  ridge  (k  ) . The  former  lets  us  guage  the  magnitude  of 
squared  loss  improvement  over  the  OLS  estimate  while  the  latter, 
being  the  absolute  minimum  squared  loss  for  using  a single- 
parameter ridge  estimate  for  8,  enables  us  to  obtain  an  empiri- 
cal relative-efficiency  of  each  method  as  well  as  observing 
whether  any  of  the  other  estimators  is  capable  of  achieving  a 
smaller  squared  loss  (or  a smaller  average)  than  the  minimum  loss 
that  could  possibly  be  achieved  by  a member  of  the  single-para- 
meter ridge  class  of  estimators. 


APPENDIX 


The  computer  programs  used  in  this  study  were  coded  by 
A.  Mitra.  Computations  were  performed  in  double-precision  on 
am  IBM/370  model  165  machine,  with  programs  coded  in  FORTRAN 
and  compiled  by  WATFIV  (Version  I,  level  5). 

The  reported  simulation  results  were  based  on  the  use  of 
subroutines  RANDU  and  GAUSS  from  the  IBM  Scientific  Subroutine 
Package  (1970)  for  pseudorandom  number  generation.  In  addition, 
the  statistical  reliability  of  certain  results  (including 
Table  4)  were  verified  by  the  use  of  better  uniform  and  normal 
generators : 

The  uniform  pseudorandom  number  generator  was 

Xi+1  * 764,261,123  Xi  (mod  231  - 1), 

which  was  reported  by  Hoaglin  (1976)  to  have  excellant  spectral 
and  lattice  properties.  Its  spectral  numbers  are  C2  = 1.94, 

C3  ■ 2.10,  C4  ■ 2.58,  C5  - 4.06,  and  Cg  = 2.55;  and  its  lattice 
numbers  are  less  than  2,  i-2,  ...,  6.  Random  normal  deviates 
were  generated  by  applying  the  Box-Muller  (1958)  transformation 
to  the  uniform  (0,  1)  numbers  produced  by  the  Hoaglin  generator 
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1.  Estimators  Considered  in  Study 


HKB  : Hoerl,  Kennard  and  Baldwin  (1975,  eq.  2.2) 

F : Farebrother  (1975,  eq.  6)  - 

M : Mallows  (1973,  p.  673),  Farebrother  (1975,  eq.  11) 

MG  : McDonald  and  Galarneau  (1975,  Rule  R2) 

Ml  : (1.3)  and  (2.2) 

M2  : (1.3)  and  (2.3) 

JS+  : James  and  Stein  (1961),  Vinod  (1976,  eq.  13,  p.  6) 

HK  : Hoerl  and  Kennard  (1970  a,  p.  63) 

GM  : Guilkey  and  Murphy  (1975,  p.  770,  DRE1,  X <0.1X  ) 

i max 

B,  B+:  Bhattacharya  (1966),  Vinod  (1976,  eqs.  19,  21) 

Vinod  (1976,  eqs.  22,  23) 
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on.  | QO 

csONNOHH^'j(*iNrso\  oyjs  n co  O'  ^ O'  oo  g\  o ^ O n 

OOfNJrH(,Oi~HOr^.<J-^HLnf— i l/V-1  OOpsONfNf^Nr^ONONO'C'O' 

(Ten 

<-h  'w'  r-*  *— » v-»'  fH  csj  >*»✓  cm  ■w  m w' m ,w'  rn  ' co  ^ rn 


NQ0fl0O'(f>N(*,)'JH(N^00  00 00  <}■*— |lT)U~lf— 100'— ir^OOCMO*— ( 
iDONmoO^fN^^^ON'?'  •— (^.  ^O'O^rsOOOO'T^'J^'C 

NO^O'OO(N(N00'J»'J  ON^J-  ONH^ri^O^Of^HNHpsH 
MH>JH\THrsHCOHCOH  00,-*  ^HHHinHinH'fiH'OHvfl 


vOHOOO'OON^N(*ia>'J  ChOHOJ<J'fN^^'?{r’^'t>J 

(fiHfnvOfONN'J^NfnMrisyoninO'C^r'^O'JO'J^ 

vO  CM  00  'O  H H N CM  H N i— ( CM  i— ♦ cm  HCMUlC^>jn'JPOf^<¥1fs^f,s 
rH  ^ ^»4  CNI  's“^  CN  1 ' CM  S^/  >— ✓ cm  ^ CO  ' fO  'w'  CO  ^ cn  ^ C**l 


cMco^rvrjinr^oo^ 


O rn  cm  ^ 


H'JH'0'?0'<r^N  HHC'H(>*^o^  H 'fi  O 00  ^ 00  «-H  O CO 

covoo^cMiflONON'jocMO'Jo^ooa'OHNONHnHcnco 

hO'JO-jO'COoohoohooh^o^h^h^  ri  in  h m »-4in 


tn  vo 


iOcnyOO(r)nO''JOOO^OOOoinvO'OOCOCrsOOf- 1 N O vO 

in^cocMcsmHvftTiinwina'^^^vfinooo^N^^vOin 

vDOO'^CNOCr^Oc0Oc,r)Ocr^o>0‘O'^'O'«0O^  O^OOvOO'^O 


^OHcoo^H^Mn-finvom  minHinHO'CMONO^OHico) 
COMO^OCSOmiHOOHn*HOOH  OHOOH'OH'OHHHCMHCO 

NQiOnJHOHOHiDHOHC  N CHONO^OOJOMOtN 


PQ 

2 g 


+ 

O V) 

£ -5  > 


OLS  (Expected)  3.32  6.76  17.46  35.47  180.70  362.65 


Approximate  Percentage  of  Times  Estimators  Have 
Smaller  Squared  Loss  than  OLS. 


Low  Mult icollinearity 
oi«l,  2 

jB  Host  j5  Least 

Favorable  Favorable 

High  Multicollinearity 
a-5,  10.  50,  100 
jB  Most  ^ Least 

Favorable  Favorable 

Single -Parameter: 

Ridge : 

HKB 

96a 

89 

100 

98 

F 

98 

91 

100 

98 

M 

88 

84 

100 

98 

MG 

59 

54 

46 

44 

Ml 

96 

90 

100 

98 

M2 

95 

90 

100 

98 

Others : 

JS+ 

95 

92 

100 

98 

Multi-Parameter: 

Ridge : 

HK 

99 

94 

100 

98 

CM 

37 

33 

100 

97 

Ot  he  rs : 

B 

88 

94 

90 

95 

B+ 

92 

97 

93 

99 

V 

99 

85 

99 

95 

L — r*— 

V+ 

100 

85 

99 

95 

Average  of  Che  performance  criterion  (X  of  times  loss  of  estimator  < loss 
of  OLS)  over  all  comblnationa  of  parameters:  p»3,  6,  10;  o**1.0,  0.1. 


6.  Maximum  Rank  of  the  Mean  Squared  Error 
of  Estimators 


aMaximum  Rank  over  all  combinations  of  parameters; 
p«3,  6,  10;  a-1,  2,  5,  10,  50,  100;  o2«1.0,  0.1; 
£-most  favorable  and  least  favorable . 


Guideline  for  Choosing  Single-Parameter  Estimators 


Best 

MSE" 


"Worst 

MSE" 


Low  Multicollinearity  (a*l,2) 

High  Multicollinearity  (a*5,10,50,100) 

Small  variance 

Large  variance 

Small  variance 

Large  variance 

(a2<0. 1) 

(o2>0. 1) 

(o2<0.1) 

(a2>0. 1) 

ESI 

p*10 

BB 

p-10 

BB 

p*in 

■n 

HH 

p=10 

HKB(3+)a 

HKB(5) 

JS+(3) 

JS+(2+) 

M(2) 

M(2+) 

M(2+) 

M(3+) 

M2(6) 

HKB(1+) 

M2  (4) 

Ml  (6) 

M2  (3) 

Ml  (3+) 

"(12) 

F(ll+) 

M(13) 

F(12) 

MG  (9) 

F(13) 

MG (9+) 

F(13) 

MG (10+) 

MG  (9) 

I!G(10+) 

MG (10+) 

lumbers  in  parenthesis  represent  ranks  of  MSE  averaged  over  the  corresponding 
combination  of  parameters  and  6 * most  favorable  and  least  favorable. 
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8.  Guideline  for  Choosing  Multi-Parameter  Estimators 


"Best 

MSL" 


"Worst 

MSE" 


— — r 

Low  Multlcollln«arlty  (a»lf2) 

High  Multicollinearity  (a-5, 10, 50,100) 

Small  variance 
(o^O.  1) 

Large  variance 
(o2>0.1) 

Small  variance 
(o2<0. 1) 

Large  variance 

(o“>0. 1) 

p-3,6 

p-10 

p-3,6 

0 

1 

o. 

p-3,6 

p-10 

P-3,6 

p-10 

HK(2)a 

B+(l+) 

HK(4+) 

B+(2) 

gm( l+y 

('.M(1+) 

GM(  l ) 

GM( 1+) 

B+(2+) 

B+  ( » 

GM(11+) 

GM(10) 

0M(11+) 

GM(LO) 

B(10) 

V(10) 

B( 10+) 

V(U) 

V(10+) 

V(10+) 

V(10+) 

V(ll) 

B+(9) 

'+(11) 

B+(9) 

v+c  :) 

V+(10+) 

V+(10) 

V+(10+) 

V+C11+) 

v(ii  ;■) 

VC  11+) 

V+( 11+) 

V+  ( 1 2 ) 

^Numbers  In  parenthesis  represent  ranks 
combination  of  parameters  and  jj  ■ most 


of  MSE  averaged  over  the  corresponding 
favorable  and  least  favorable. 
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